The effects of non-native interactions on protein folding rates: Theory and simulation 
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Proteins are minimally frustrated polymers. However, for realistic protein models non-native interactions 
must be taken into account. In this paper we analyze the effect of non-native interactions on the folding rate 
and on the folding free energy barrier. We present an analytic theory to account for the modification on the 
free energy landscape upon introduction of non-native contacts, added as a perturbation to the strong native 
interactions driving folding. Our theory predicts a rate-enhancement regime at fixed temperature, under the 
introduction of weak, non-native interactions. We have thoroughly tested this theoretical prediction with sim- 
ulations of a coarse-grained protein model, by employing an off-lattice Ca model of the src-SH3 domain. The 
strong agreement between results from simulations and theory confirm the non trivial result that a relatively 
small amount of non-native interaction energy can actually assist the folding to the native structure. 

I. INTRODUCTION 

The mechanism of protein folding is of central importance to structural and functional biology (see e.g. fTj, 0, IH 0, Ql)- 
An understanding of the fundamental physical-chemical factors regulating the folding process may help provide answers to 
some of the long outstanding problems in both functional genomics and biotechnology: rational design of drugs and enzymes, 
potential control of genetic diseases, and a deeper understanding of the connection between biological structure and function 
are among the applications that may benefit from advances in protein folding. 

Theoretical and computational studies have recently achieved noticeable success in reproducing various features of the 
folding mechanisms of several small to medium-sized fast-folding proteins (see e.g. f^,'7',^,'9','T^, 1 l,T2',T3'l); at the same 
time, the improved spatial and temporal resolution of recent experimental techniques is now allowing researchers to combine 
theoretical and experimental data to give a more robust characterization of the folding free energy 

landscape 111 dims 

However in spite of these recent successes, a microscopically detailed observation of the individual conformational 
motions that occur during folding remains elusive. A knowledge of the time-dependence of every degree of freedom in the 
system is, however, not of inherent interest, since no additional insight to the underlying physics of the folding process is 
gained from this information by itself. Nor is any particular degree of freedom especially important to folding, because the 
transition involves the cooperation of many weakly (non-covalently) interacting constituents. For these reasons a statistical 
description of the process of folding, in terms of the behavior of an ensemble of systems, is appropriate for distinguishing 
general (self-averaging) properties from sequence-specific ones |20]. The characterization of the folding process in statistical 
mechanical terms can pinpoint crucial questions that may be computationally or experimentally addressed in more detail. 

The idea of considering ensemble properties to characterize the folding landscape underpinned studies of the transition state 
and folding mechanism as arising from the native state topology |11, 21, 22, 23, 24, 25, 26, 27, 28]. As a general rule, 
the transition state structure does not differ dramatically between homologous proteins i29tl30t i311. and any exceptions are 
fairly readily explained js^E^l ■ Consistent with the above-mentioned notions of self-averaging, folding rates of homologous 
proteins are seldom seen to differ by more than an order of magnitude when tuned to the same stability |34, 35]. This 
indicates that the folding free energy barrier is not particularly sensitive to the details of sequences folding to a given native 
structure, but depends rather on more general features of that ensemble of sequences, including the kinetic accessibility of 
that native structure. In this sense, the topology of the native structure largely determines the folding free energy barrier for 
those homologous sequences \ 3^. 

These ideas motivated many studies of folding rates and mechanisms using so-called G5 models Js^l, which neglect inter- 
actions not present in the native state. In these studies the possibility of structure prediction is traded for the possibility of 
rate and mechanism prediction. Moreover, because of the robustness of rate and mechanism for homologous proteins, the 
coarse-graining of the Go model (i.e. removing the molecular details of side-chains and solvent) is often assumed a reasonable 
approximation. 
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Topology-based approaches seek to predict mechanism by calculating (/)-values JstIIssIi or analogous quantities, which in 
an accurate theory give values that correlate with experiment for the measured cases. Occasionally one finds residues whose 
(/(-values are negative. This is most likely due to the presence of non-native contacts that stabilize the transition state, but 
cannot be present in the native state. The presence of non-native interactions in the transition state is supported by all-atom 
simulations using a Charmm-based effective e nerg y function, where it was found that about 20-25% of the energy in the 
transition state arose from non-native contacts 13911 . 

Hence for a more realistic protein potential energy function, non-native interactions must be taken into account. In this paper 
we analyze the effect of increasing the strength of non-native interactions on the folding rate as well as the free energy barrier. 
Non-native interactions are introduced as additional contacts between pairs of residues not in contact in the native structure, 
which are allowed to have a non-zero mean and a non-zero variance. The non-native interactions are added perturbatively 
to the G5 model: all non-native contacts are given a random energy with mean e^^ and a variance which is progressively 
increased to examine more frustrated proteins, while the native contact energies are all held fixed to the same number. The 
limiting case of = and b = corresponds to the plain Go model. This procedure essentially preserves the stability of 
the native state, where approximately no non-native interactions are present. However, the stability of the unfolded state is 
lowered (as shown in ^11 CI and ^lllBl of this paper). 

At first glance one would expect that introducing progressively larger non-native contact energies to an otherwise energet- 
ically unfrustrated Go protein would slow the folding rate, for straightforward reasons: It would seem that "noise" in the 
system would make the native basin harder to recognize. One might argue by analogy that it is easier to read a page of 
text without random misspellings. However, the folding rate has been predicted to initially increase under the introduction 
of weak, non-native interactions, added as a perturbation to the strong native interactions driving folding |40]. This was a 
fold-independent result derived from general principles of energy landscape theory. This prediction was subsequently verified 
in simulations of a 36-mer lattice model |41|, as well as off-lattice molecular dynamics simulations of Crambin, in which 
attractive non-native contacts were successively added l42ll . Independently, it was found that non-native interactions were 
present in the transition state of a 28-mer lattice-model protein with side-chains, and increased the folding rate when strength- 
ened |43]. Similar observations were also seen in 2-dimensional 24-mer lattice models |44|. A different computational study 
on a 36-mer lattice-model protein found that at the temperature of fastest folding in simulation models, the folding rate mono- 
tonically decreases with increasing ruggedness |41] (the temperature of fastest folding of course varies with the ruggedness). 
However this typically barrierless regime is rarely seen in the laboratory i45ll46ll . 

The prediction that strengthening non-native interactions that were initially weak would accelerate folding is also consistent 
with experimental observations that strengthening non-specific hydrophobic stabilization in a-spectrin Src homology 3 (SH3) 
domain sped up folding (and unfolding) for that protein i47ll . This result was significantly non-trivial, to the extent that the 
experimental observation was originally interpreted (mistakenly) as evidence against the energy landscape theory. 

In this paper, w£ test this prediction with simulations of a coarse-grained protein model, by employing an off-lattice Cq, 
model (see e.g. |26, 48]) of the SH3 domain of .src tywsine-protein kinase (src SH3). domain. We use a Hamiltonian function 
that has tunable amounts of non-native energy (see Appendix [Dl for details). The results from simulations are compared 
with the predictions of an improved version of the existing theory |40]. The theory is improved by introducing a finite-size 
treatment of packing fraction as a function of polymer length, which takes better account of the polymer physics involved in 
collapse as folding progresses. Moreover, the previous study treated the rate enhancement at fixed stability. Here we show 
a perhaps even less intuitive result, namely that the rate-enhancement can happen at fixed temperature, and we derive the 
conditions required for this to happen. 

As the strength of non-native interactions is increased to larger values, we find that eventually the folding rate decreases 
drastically, as expected. In the limit of large non-native contact energies, the chain behaves like a random heteropolymer, 
having misfolded structures more stable than the native state. 

The folding mechanism is also non-trivially effected by the introduction of non-native interactions. In this regard, the 
analysis of the robustness of the folding mechanism against an increasingly strong perturbation on the non-native interactions 
can provide a critical assessment on the validity of unfrustrated protein models for the prediction of folding mechanism, for 
different protein topologies. This analysis goes beyond the scope of the present paper and it will be addressed separately il49ll . 

The paper is organized as follows. In the next section (^U we present the theory. After presenting the general ideas and 
overall strategy ( ^11 A> . we discuss in detail how an explicit expression for the conformational entropy can be obtained in 
terms of the packing fraction ( ^11 B> . We use this result to show how thermodynamic free energy barrier is lowered by the 
presence of non-native interactions ( TOO . In section|ni]we test the theoretical predictions with direct simulation of the src- 
SH3 domain. We first compare the definition of reaction coordinates and the relative approximations of theory and simulations 
( TOI A> : thermodynamic ( TOIB I and ^III C> and kinetic quantities ( TOIDt obtained from simulations are then quantitatively 
compared with the corresponding theoretical predictions. 

The strong agreement between results from simulations and theory confirm the non trivial result that a relatively small 
amount of non-native interaction energy can actually assist the folding to the native structure. 
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II. THEORY OF FOLDING WITH NON-NATIVE INTERACTIONS 



A. Delinition of the general strategy 

Thermodynamic quantities relevant to folding may be obtained from an analysis of the density of states in the presence of 
energetic correlations |4, 5]. In this context we introduce two order parameters. We let Q be the fraction of contacts shared 
between an arbitrary structure and the native structure, and we let A be the fraction of possible non-native contacts present in 
that structure, i.e. the number of non-native contacts divided by the total possible number of non-native contacts. These two 
order parameters are natural for the study of non-native interactions in protein folding. Both take on values between zero and 
unity. 

There are several relevant energy and entropy scales governing the thermodynamics of folding. Let the energy of the native 
structure be given by E^^. Let the total number of contact interactions in a fully collapsed polymer globule be given by M. 
Asymptotically, M scales like the total number of residues in the chain, A^, essentially because surface terms are negligible 
compared to the bulk. However for a finite size system, the mean number of contacts per residue (native or non-native), i.e. 
the coordination number z, is itself a function of A^. We can write the native energy as 

E^=Me = zNe, (1) 

where e is then defined as the mean native attraction energy e (e < 0), i.e. the native state is assumed to be fully collapsed 
with the maximal number of contacts, and this is the maximal number of total contacts of a fully collapsed polymer globule. 
We neglect here the separate effects that arise from the variance in the native interaction energies: 5e^ = 0. 

Let the conformational entropy of an ensemble of polymer structures characterized by the order parameters Q and A be 
given by 5^(2, A). We can write the entropy in terms of the entropy per residue sdQ^A) as 

5,(e, A) = Ns,{Q,A) = MsMA)/z . (2) 

In addition to the energy scales e and 6e^ governing native contacts, there are also two energy scales governing non-native 
interactions. One is the mean energy of a non-native interaction Enn, and the other is the energetic variance of non-native 
interactions b^. We keep both of these terms, as they enter the analysis on essentially the same footing. For configurations 
with MA non-native contacts, the total non-native energy is taken to be Gaussianly distributed with mean MA e^N and variance 
MA b^. Both of these terms contribute to the overall ruggedness of the energy landscape by favoring non-native configurations. 

The strength of non-native interactions is taken to be weak, so that 

b/e«l (3a) 
Enn/e « 1 (3b) 

are both satisfied. Condition ( l3at implies that the ratio of the folding transition temperature Tp to thermodynamic glass 
temperature Tq is large L5Q.1 

T,/T, » 1 , (4) 

i.e. the proteins we consider are strongly (but not infinitely) unfrustrated- we are perturbing away from the Go model. 
Condition dSbl l implies that collapse and folding occur concurrently L5 IJ . i.e. 

T,/Te»l, (5) 

where Tg is the temperature below which non-native states tend to be collapsed. For a given choice of non-native interaction 
energies, the energies of configurations for the ensemble of states characterized by {Q,A) is assumed Gaussianly distributed 
with a mean of QMe+AMem and a variance of AMb^. Then the extensive part of the log number of states having energy E 
and order parameters (Q,A) is given by 

log n(£, e, A) = 5,(2, A) . (6) 



From the definition of equilibrium temperature T ' = dS/dE, one can then find the thermal energy, entropy, and free energy. 
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which are given by (in units where A;b = 1): 



eQ+ eNN-T;7 ]A (7a) 



(7b) 



M V T 



M z \2T 



•-2 



- (Q-T + et,N-— A. (7c) 



M z \ IT) 

These expressions can be understood straightforwardly. In the absence of non-native interactions (e^N = ^ = 0), the thermal 
energy is just the energy of native contacts times the number of native contacts, and the entropy is just the configurational 
entropy. When non-native energies are present, just as e couples the order parameter Q, so does couple the order parameter 
A. When non-native energies have a variance, the lower energy conformations (with stronger non-native contacts) tend to be 
thermally occupied. This is why e^w and -b^ /T enter on the same footing in the energy. The fact that the system spends more 
time in fewer states means that the thermal entropy is reduced. However the entropy (times temperature) is only reduced by 
half as much as the energy, so there is a residual contribution to the free energy E-TS due to the variance of non-native 
interactions. 

A plot of the free energy at the folding temperature of the Go model T" as a function of {Q,A) is shown in first row of 
figure[2 for equation (ITct together with the analytical model of the configurational entropy Sc{Q^A) described below. 

Figure [2 also shows plots of E{Q,A), S{Q,A), and F(Q^A), as well as the number of states at energy E, taken from the 
simulation data for the off-lattice model (see section lIllBt . Plots are at the folding temperature T° of the Go model, for 
several different values of b indicated. 



B. Conformational entropy in terms of packing fraction 

The fraction of non-native contacts A is not independent of Q. As more native interactions are present, less non-native 
interactions are allowable, and eventually there can be no non-native contacts in the native structure. Previous studies that 
investigated the folding rate at fixed stability have explicitly included this Q-dependence in equation ITct |40]. Here our 
intention is to plot the folding rate at fixed temperature rather than at fixed stability. For this purpose it is formally more 
convenient to keep this Q-dependence implicit in A. Again this manifests itself only as a region of allowed values of (2, A), 
which can be seen in figure [J 

The entropy loss due to native contacts is of a different functional form than the entropy loss due to non-native contacts. 
The entropy loss due to native contacts arises from a specific set of polymeric constraints. The entropy loss due to non-native 
contact formation arises from an increase in polymer density, a non-specific constraint. There are many collapsed unfolded 
states with non-native interactions present, but only one folded state (neglecting the much smaller entropy due to native 
conformational fluctuations). 

We note that the conformational entropy SdQjA) takes into account the extent to which polymer configurations tend to have 
residue pairs in proximity, such that if they interacted, that interaction would be considered a non-native contact. However 
the strength of the typical non-native interaction (~ ± b) is controlled by 2 free parameters in the theory. When both 6^ 
and b are set to zero, the thermal entropy reduces to that in the putative Go model, with the configurational entropy SdQ.A) 
remaining unchanged. 

The A-dependence in S{Q,A) is related to the physics of collapse, since at a given value of Q, the fraction of non-native 
contacts A depends on the packing fraction 77 of non-native polymer. When MQ native contacts are present, A„ax = M{\ - Q) 
non-native contacts are allowable, and A„ax non-native contacts are present when 77 = 1 . 

As detailed in Appendix IaI a mean field approximation allows one to estimate the conformational entropy SdQ^rj) of a 
disordered polymer at Q with packing fraction 77 as: 



Sc{Q,ri) = N(l-Q){\n--(^)\n(l-r^) ^ 



N(l-Q)s„„(Q,r,). (8) 



Here r](Q) = J(Q)~^^^ = [ni^(Q)/N(l- Q)Y^^, where I is the mean loop length formed by native contacts at Q (see equa- 
tion ( IA.14t '). and tii^iQ) is the total number of loops at Q (equation ( lA.lSh . In equation (|8} the quantity in curly brackets is 
the entropy per residue for the remaining disordered polymer at Q. 

Figure|2lshows a plot of the entropy per disordered residue at Q, s„n{Q, rj) = S{Q,ii)/N(l- Q), as a function of 77, for various 
values of Q. This shows that the non-native polymer density where most of the states are (where s„„{ri) is maximal) is an 
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increasing function of nativeness Q. 
From equations (|3 and (|8j 

Sc(QA) = (1 - Q)s„n{QA/{^-Q)) (9) 

The entropy per residue Sc{Q,A) in equation (I7b> is then obtained from equation using 

Sc{Q,A) = Se{Q,ri)\^^AIii-Q) (10) 

(see equation (IA.3t . The free energy surface on which dynamics occurs can then be obtained from equation fTcl l. and is 
plotted in first row of figure^ This is the reaction surface for the coordinates {Q:A). 



C. Effect of non-native interactions on free energy barrier and folding rate 

In the Go model, non-native contacts are given coupling energies of zero. The Go folding temperature T° is taken to be the 
temperature where the unfolded and folded thermodynamic states have equal probability. This is given through equation ( TTct 
when F(0, A) « F( 1 , 0) and = b^ = 0. We are taking 2 ~ in the unfolded state and A = in the folded state (see figure^. 
This yields a Go folding temperature of 

z\e\ 

T" = tU (11) 

' 5,(0,A*(0)) 

where A*(2) is the most probable value of A at a given Q, as determined below. 

When considering the simulation data, the folding temperature is taken to be the temperature in the G5 model where the 
unfolded and folded thermodynamic minima have equal free energies (these minima need not be precisely at 2 = and Q=\). 

The most probable value of A at a given Q for a protein in thermal equilibrium, A*{Q), is obtained from 



dF{Q,A) 



dA* 



= 0. (12) 

G 



Using equations (^cji and (|8jl this gives: 



dri ~ T 2^2 ■ *■ 

where r]*(Q) is the most probable packing fraction at a given value of Q. 
Using the following definitions: 

AA*(0 = A*(0-A*(O), 

As„„(Q) EE 5„„(e,A*(0)-i„„(O,A*(O)), 

the minimal free energy at Q, F(Q,A*(Q)), relative to the minimal free energy F{0,A*(Q)) in the unfolded state, is obtained 
from equation ( TTcl : 

AFiQ,T) = F(Q,A*(QIT)-F(0,A*(01T) 

(14) 

AFiQ,T) ^ ^ ( r.„„(0,A-(0))X _ ny-Q)As„m ^.^^^ ^^^^ 



M ^\ z ) z V 2r, 

With the temperature set to the Go transition temperature T" , the first term in brackets in equation J15> vanishes. The free 
energy barrier (over T°) at the Go transition temperature can then be written as 

^-^.Mi^^-J^\AA*iQ^ (16) 
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where AF"^ is the barrier height at T" with e^^ = = 0, i.e. the putative Go barrier height, and is given by: 

=N{1- Q^) {-As„„{Q^)) (17) 

-'f 

where the saddle point is located at (Q^,A^ = A*(Q^)). Note that As„„{Q) < because disordered polymer dressing larger 
native cores is more collapsed than that for smaller native cores. One can see that the barriers scale extensively as a result of 
the mean field approximations made above. 

So we see from equation (I16> that the folding barrier lowers with increasing non-native interaction strength, namely if 
Cnn <0 (b^ > always), so long as AA*{Q^) = AA"^ > 0. So now we investigate the conditions for which AA^ > 0. 

From equation iA.l\ . the condition AA^ > is equivalent to 

AA^=7]*(Q^)(l-Q^)-rj(0)>0 (18) 

where rj* is determined from equation il3i . 

We are interested in the effect on the barrier when non-native interactions are imagined to initially increase from zero. For 
Cnn,^ ~ 0, the most probable packing fraction is interpreted geometrically through equation il3i as the value of 77 where the 
entropy per disordered residue is maximal, i.e. the maximum of the curves in figure |2 When e^^, < and/or b > 0, if is 
determined as the value of 77 slightly to the right of the maximum in the curves in figure The most probable packing 
fraction as a function of Q is plotted in figure|3l 

Equation ( I18t is not a particularly robust condition. While rf{Q) is certainly a monotonically increasing function of Q as 
can be seen from figure|3 the factor of (1 - Q) in equation ( II8I 1 de-emphasizes, or may reverse, the trend in A*(Q). In the 
earlier work addressing the trend in rates at fixed stability rather than fixed temperature, the factor determining whether rates 
would increase was merely the increase in packing fraction ArjiQ) by itself |40]. 

The derivative of 5„„ in equation jl3> can be straightforwardly determined from equation (jSJl, and equation jl3t then becomes 
a non-linear equation for •q*{Q) that can be solved numerically. The result is shown in figure|3 The packing fraction increases 
as the length of disordered loops becomes shorter (inset of fig|3|l, and thus increases monotonically with nativeness Q. 

Once ifiQ) is known, AA*(Q) can be obtained from ( I18> . This determines the trend in the barrier height by equation ( I16t . 
A plot of AA*{Q) is shown in figure|3 We can see that if the barrier position resides in a window of Q where AA(Q) > 0, 
the barrier decreases with increasing non-native interaction strength, /or weak non-native interactions. Otherwise the barrier 
increases with increasing non-native interaction strength. 

When non-native interactions are weak, the folding kinetics are single exponential: 

^P = ^„(e,„fe)e-^^^<^'^™''"/^ (19) 

Increasing the strength of non-native interactions slows the prefactor kg, due to the effects of transient trapping. However as 
Enn and b are increased from zero, this slowing effect on kg does not become significant until a non-zero characteristic value, 
which would indicate the onset of a dynamic glass transition in an infinite sized system (see I40.l5 ^l53ll54tl for more detailed 
treatments of this effect). In a finite system the activation time ^ k~^ increases dramatically but only when b>b,^or e^N > Cnn- 
The values of the energy scales b^ and e*^ are of order T, so there is a fairly large window upon increasing b^e^ from zero 
where the prefactor kg is unaffected to the first approximation. In this regime the effects on rate are governed solely by the 
effects on barrier height. Hence the decrease in barrier height shown above as e^N , b are increased from zero may be associated 
with an increase in folding rate. 

In the next section we test the theoretical prediction directly with simulations of a model protein. The upshot is shown in 
figures^l(b) and (c) below, which show indeed an increase in folding rate with increasing non-native interaction strength. 



III. COMPARISON BETWEEN THEORETICAL PREDICTION AND SIMULATION RESULTS 

We have thoroughly explored the range of validity of the approximations made in the analytic theory by comparing the 
predictions with the results obtained from simulations on a Go model increasingly perturbed by the addition of non-native 
interactions (see ApDendixIDlfor details on simulation). 

A close and quantitative comparison of the results from theory and simulations is possible if corresponding thermody- 
namic quantities and reaction coordinates are identified. For this purpose, before we proceed to test the prediction on rate 
enhancement, three main points of the theory have to be examined in comparison with the results from simulations: 

• definition of the reaction coordinates Q and A 

• allowed values of the reaction coordinates {i.e. correlation between Q and A) 

• approximations made in the definition of energy and entropy as functions of the reaction coordinates 
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These points are clearly interconnected and all effect the detailed shape of the free energy landscape, the value of the folding 
temperature, and the identification of the folded, unfolded, and transition state ensembles. We expect that the assumptions we 
have made in the analytical theory do not qualitatively change the theoretical predictions, nevertheless a careful dissection of 
the basic ingredients we have used is needed for a quantitative assessment of the results. 

In the following we discuss in detail each of the points above. Unless otherwise specified the following results are all 
obtained from simulations at the Go folding temperature Tj^, for all values of b/e. 

A. Definition of reaction coordinates 

The reaction coordinate Q, defined as the fraction of native contacts formed in a given protein configuration, is readily 
associated to configurations sampled by simulations (see Appendix^. More care has to be used in transposing the other 
reaction coordinate we have used in the theory, A (defined as the fraction of non-native contacts formed), to the analysis of 
simulations data. In the analytical theory we have assumed that the maximum number of non-native contacts that can be 
formed at a certain stage of the folding reaction does not depend on the perturbation strength, and is a function of the degree 
of nativeness, Q, that is A,„aAQ) = 1 - 2, Vfe (see equation jA.U ). This implies that no non-native contacts can be formed in 
the native state (A„,ax ^ if 2 ^ 1)^ and vice-versa (A,„ax ^ 1 if 2 ^ 0). This assumption in the theory allows us to simplify 
the analytical calculations but does not qualitatively affect the results. The dependence on Q of the maximum number of non- 
native contacts can be directly checked in simulations. In this regard, an important difference between theory and simulations 
is that a certain number (typically ^ 5) of non-native contacts can be accommodated in a protein configuration with Q ^ 1 
and minimal (less then 1 A ) rms deviation from the pdb native structure. The increased number of contacts around the native 
configurations arises mainly from the fact that native or non-native contacts are considered formed in a small but finite length 
range (typically ^ 1 A ) around the minimum of the interaction potential. This leads to probable formation of some non-native 
contacts as the protein undergoes fluctuations around the native state. 

Figure [S] shows that a subset of 6 non-native contacts is formed with probability > 0.25 in the native state ensemble for 
b/e= 1.3. Similar results are obtained for all values of b/e used in this study, although the particular set of non-native 
contacts formed in each case depends on the choice of non-native interactions (data not shown). 

Contacts that are easily formed in the native state can not be considered non-native, even when they are not listed as native 
contacts in the unperturbed Go-like Hamiltonian. In fact, contacts that can be made in the native state are not competing 
against the formation of the native structure, rather they are assisting it. In order to remove this effect, we introduce a new 
reaction coordinate A', defined as the fraction of non-native contacts formed, restricting the list of non-native interactions 
only to the ones with a probability of contact formation in the native state ensemble smaller than a cutoff value pc- The native 
ensemble for each sequence is identified as all configurations with Q > 0.9 sampled in simulation for that sequence. The 
results presented in the following are all obtained with a probability cutoff pc = 0.1. Smaller values of pc yield essentially 
the same results. The reaction coordinate A' is then used in this study to compare results from simulation with the theoretical 
predictions. 

Another approximation that can be directly checked in simulation is on the maximum number of non-native contacts that 
can be formed at different stages of the folding reaction. In the analytical theory, the fraction of non-native contacts. A, is a 
function of the fraction of native contacts formed in a configuration, Q, and of the packing fraction rj of the non-native part of 
the protein: MA = rj(\-Q)M (see equation ( IA.1» . with < 77 < 1 , VQ . The maximum number of non-native contacts is then 
A-maxM = (1 - Q)M, and the maximum total fraction of all contacts (native and non-native) is (A + Q),„a\ = 1 , VQ. Indeed, the 
maximum number of all contacts (both native and non-native) recorded in simulations is close to the number of native contacts 
formed in the native state, i.e. M{Q+A')„,ax — M, for all values of the parameter b examined in this study (see figure|6la)). 
Figure|6jb) shows the behavior of the average number of non-native contacts formed in simulation (both coordinates A and A' 
are plotted), as a function of Q, for a perturbation b/e = 0.5 (right panel), and the value of Q corresponding to the maximum 
of (A') (the corresponding Q for the uncorrected coordinate (A) is also shown). Interestingly, the peak in the average number 
of non-native contacts is detected for a value of Q corresponding to a pre-transition state stage of the folding. A pre-TS peak 
is observed in both theory and simulations, although in the theory it is closer to the unfolded state than what detected in 
simulations (see figures|6jb) and^Ia)). 

Figures|5Ja)-(b) and0a)-(b) present a thorough comparison between the allowed and most probable values for the fraction 
of non-native contacts at different stages of the folding, as obtained from theory and simulations. Although the maximum 
number of non-native contacts is always detected in a pre-TS region, independently on the value of b/e, it is clear from|6ja) 
and0^a) that larger values of b/e yield larger a number of non-native contacts formed, particularly in the unfolded ensemble. 
Interestingly however, the number of non-native interactions rapidly decreases to zero in region with very small Q. The 
cause of this effect is not contained in the analytical expressions ( ITct . where it is assumed that A,„ax =1-2- This result is 
due partly to coupling between non-native contacts and the angle and dihedral terms in the simulation Hamiltonian (which 
are not present in the theory). This is a finite size effect which tends to increase the polymer stiffness relative to that in the 
theory, which used a bulk approximation for thermodynamic quantities. Compact states with ~ 1 in which only non-native 
interactions are present have large energetic cost and are formed very rarely. Another source of this effect is that forming 
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collapsed conformations induces some native contacts to be formed, due to the finite range of interactions. This effect is 
particularly important for short-range contacts among residues closely separated in sequence, and does not necessarily go 
away as one considers larger size systems. This is the complementary effect to the already mentioned fact that in simulations 
non-native interactions are formed in the native state (that has led us to a redefinition of the simulation reaction coordinate 
A'). 

In order to quantify this effect we have generated a large (50) set of non-native energy distributions with high and very 
high variance {b/e>2 and b/e^ 2). Sequences with these high values of b/e are not able to fold to the selected native 
structure, but are useful to explore the region of the configurational space corresponding to compact structures with the 
maximum number of non-native contacts formed. We expect the glass temperature of these sequences to be higher than their 
folding temperature (see next section). After an initialization at very high temperature {T ^ rj*), a large number ( > 1000) 
of quenching simulations (T <C 7^^) has been performed for each sequence to generate a representative sample of compact 
misfolded structures. The maximum fraction of non-native contacts that can be formed A'^^ is thus defined as the largest 
values of A' among the vast pool of structures obtained by adding the results from the quenching simulations for high b/e 
values to all configurations collected in simulations at any temperature and for any value of b/e used in this study. Figure 
[Tib) shows the behavior of A'^^^ as a function of the fraction of native contacts present in the structures. The theoretical 
assumption on the maximum fraction of non-native contacts A„ax =1-2 holds remarkably well up to the values of Q< 0.15, 
that corresponds to the unfolded state minimum in the free energy landscape (see figure[0. From these results we then expect 
the unfolded region of a free energy landscape associated with the simulated protein Hamiltonian to be somewhat compressed 
toward smaller values of A with respect to the theoretical prediction. 

B. Energy, entropy and free energy landscape 

Figure [^presents the energy, entropy, and free energy profiles obtained from simulations, as a function of the reaction co- 
ordinates Q and A', for three different values of the perturbation parameter b/e. The corresponding quantities obtained from 
the analytical theory, with all the parameters set equal to the simulations parameters (i.e. rightmost column in table I) and 
b = 0.3e, is also shown for comparison. For a more direct comparison with the results from simulations, the thermodynamic 
quantities from theory are only plotted in regions populated with probability larger than 2 x 10"^, as we have typical sam- 
plings of ~ 5 X 10^ configurations in folding/unfolding simulations. It is apparent from figure^that the region of the (Q,A') 
space populated with high probability in simulations differs somewhat from the (Q,A) region predicted by theory. Several 
factors are responsible for this difference and have to be considered before one tests the predictive power of the theory with 
the simulation results: 

(i) The unperturbed energy function used in simulation includes a self-avoiding term for all non-native contacts, that is main- 
tained in the perturbed Hamiltonian (see equations ( ID.3> . (ID.4> and figure[T2t. This energy term is not explicitly considered 
in the theory. The short-distance repulsive interactions limit the formation of non-native contacts (especially for small values 
of b/e), and shifts the most populated regions of the folding landscape toward lower values of A . This effect also accounts for 
most of the differences in the energy landscape between theory and simulation results (see figure [Q. 

The analytic expressions are obtained in the thermodynamic limit, while simulations are performed for a small protein (57 
residues). The theoretical expressions do not explicitly keep track of finite-size effects due to polymer stiffness. However the 
extra effects of polymer stiffness seen in the simulations only enhances the theoretically predicted rate acceleration effect (see 
section inr^ . 

(iii) The functional form for the entropy is approximated in the theory and it is not expected to quantitatively reproduce the 
simulation results exactly. Particularly, the theoretical assumption on the allowed values of A at different Q {i.e. equation lA.H 
directly enters the derivation of the entropy (see Appendix (|A|i for details), and contributes to the relative "distortion" of 
the theoretical free energy landscape with respect to the landscape in the simulations. Nevertheless, the overall qualitative 
behavior of the entropy is correctly captured by the theory (see third column of figure 

(iv) The position of the folded and unfolded free energy minima emerging from simulation data differs from 2 = and 2=1, 
as assumed in the theory (see also section § 111 C> . 

Overall, the destabilization of the folding free energy landscape upon introduction of non-native energy perturbation is 
strongly reduced in simulations with respect to what predicted by the theory. In fact, while in the theory a perturbation 
of b/e ^ 1 renders a protein unfoldable {i.e. Tf/Tg ^ 1, see equation M\\ and ref. |40]), it is found in simulations that 
all sequences generated with a perturbation parameter b/e < 1.7 (entering the Hamiltonian ( ID.4» are able to reversibly 
fold/unfold at the Go transition temperature 7y'. The next section quantifies this difference in the destabilization of the folding 
mechanism by comparing the folding and glass temperatures computed in simulations with their corresponding theoretical 
predictions. 
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C. Folding temperature and glass temperature 



The folding temperature 7/ of each protein model is estimated in simulation as the temperature corresponding to the peak 
in the specific heat curve (see figure|S^). This value is in good agreement (within the error bars) with the value obtained from 
the alternative definition of 7) as the temperature at which the folding and unfolding states have the same free energy (see 
figureU)). 

From equation ( TTct . upon increasing non-native interaction strength (increasing b, and/or increasing |eNN| with Enn < 0), the 
free energy F(Q) lowers with respect to the G5 free energy at fixed temperature T = T° . During this change of Hamiltonian, 
the free energy of the native structure remains roughly constant at (see figure |S](c)). Even though the unfolded state is 
stabilized with respect to the folded state during the process of increasing non-native interaction strength at fixed temperature, 
the folding rate nevertheless accellerates, because the free energy of the transition state lowers more than the unfolded state 
does. This is described in more detail below, with the result shown in figure[TUl(b). 

The thermodynamic glass temperature Tg can be estimated by using the results obtained in the framework of the random 
energy model (REM) |55, 56]. As the energetic frustration of the system arises from randomly assigned non-native interac- 
tions, we assume that the energy of compact (misfolded) structures in the unfolded ensemble is Gaussianly distributed, with 
mean value {E„n(Qu)) = MA,„axtnn and variance dE^^(Qu) = b^A„,axM, where MA„,ax is the maximum number of non-native 
contacts the protein can form. In the theory the maximum number of non-native contacts was approximated at 2 ~ as the 
total number of native contacts MA,„ax = M. As we have already discussed in section § IIII Al the actual maximum number of 
non-native contacts detected in simulation is smaller than the theoretical value, and it expected to (slightly) vary with different 
realizations of the non-native noise (see figure|5). 

The REM glass temperature is defined by the vanishing of the thermal entropy jssl l56ll ). which corresponds to setting 
equation dTbl to zero: 



where SE„„(Q„) = MA„j„xb^ is the energetic variance over the set of misfolded structures. For each protein model (i.e., each 
value ofb/e) we have performed several (more than 500) short quenching simulations to explore the compact configurations in 
the unfolded ensemble. A different open configuration is initially created by means of ancillary high temperature simulations 
(with T ^ Tf), then rapidly quenched to very low temperatures (T ~ 7//10, T ^ 7//25, and T ^ 7y/50). The fluctuations 
of the non-native energy in the compact misfolded configurations recorded during the quenching simulations are used to 
compute SE„„(Q„) entering expression (12 1> . 

Figure |9j a) shows the folding temperature T/ and the glass temperature Tg obtained from simulation, as a function of the 
strength of the non-native energy perturbation, b (in units of the native energy per contact, e). The folding temperature is 
almost constant in the range shown, while the glass temperature raises from zero (b = Q corresponds to the plain G5-like 
model with no energetic frustration, see equation ( ID.3» . to values close to Tf for large non-native perturbations (b > 1.6). 
When Tg/Tf K, 1 many low energy misfolded structures compete with the native state and folding is dramatically slowed 
down. As the ratio Tg/Tf increases beyond unity, the system is no longer self-averaging, and different realizations of the non- 
native perturbation can lead to different folding mechanisms consistent with the same native topology. This point is discussed 
in a separate publication |49]. The glass temperature predicted by the theory for different values of b are also obtained from 
equation (12 1> . with SE^ = MA,„ax{Qii) , MA„u,x{Q) = M{\- Q), and 2„ corresponding to the unfolded free energy minimum at 
Tg. The theoretical folding temperature is evaluated as described in section § III CI The comparison of the folding and glass 
temperatures from simulation with the corresponding values predicted by the theory (dotted curves in figure |5Ja)) clearly 
shows that the destabilizing effect of the non-native energy perturbation on the folding process (quantified by the ratio Tg/Tf 
is much reduced in simulation with respect to the theoretical prediction. Each value of b used in simulation (bsim) is plotted 
in figure|9jb) as a function of the value of b used in the theory (btheorv) which yields the same Tg. The corresponding Tf{bsim) 
(from simulation) and Tf(btheory) (from theory) are also found equal within the error bar. 



SiQu,A„,ax, Tg) — NSc{Qu,Ai„ax) 




= 0, 



(20) 



however here we let A„,ax be a new parameter This gives for the glass temperature: 




(21) 



D. Folding rate enhancement/depression upon non-native energy perturbation 



The theoretical prediction on folding rate enhancement upon small non-native energy perturbation is expected to hold for 
values of b with a corresponding small ratio Tg/Tf. A perturbation that largely increases the ratio Tg/Tf will also largely 
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decrease the prefactor ko in equation il9i . and folding then slows (see discussion in section § 111 C> . Because of the extended 
range of b for which the condition Tg/Tf ^ 1 remains valid in simulation (see previous section), we expect to detect a rate 
enhancement in simulation up to values of b ^ 1, i.e. the theory is conservative in that rate is enhanced over a wider range of 
b in the simulations. 

We have shown in the previous section that the analytical theory reproduces correctly, at a qualitative level, the thermody- 
namics quantities measured in simulation, although we have highlighted some quantitative differences. The effect of these 
differences on equation jl6> which predicts the rate enhancement is expected to be confined to the precise evaluation of 
the difference in the number of non-native contacts between the transition state and unfolded state, AA^ = AA*{Q'^), and 
to a lesser extent the precise positions of the transition state and unfolded state, and Qu respectively. Equation (I16> 
can thus be directly and quantitatively tested if AA*(Q^) is evaluated from simulation. Figure [Tof a) shows the difference 
M ((A')7-s- (A%) = MAA'^ between the average number of non-native contacts M{A')ts, formed in the simulated transition 
state ensemble, and the average M{A')u, formed in the simulated unfolded ensemble. This number slightly varies over the 
range of b values where we expect to find the rate enhancement effect (Tg/Tf < 0.25 up to < 1.3e). Since the variation of 
MAA'^ with b in this range is smaller than error bar associated to it, we consider its average over the different b values (straight 
red line in figure llOf a)). This average value is then used in equation ilbi : the resulting quantity \n{k/ko) = { AFf-AF^)/T° 
is compared with the difference in log folding rate estimated directly from a large set of folding simulations. Figure [Tof b) 
shows that the agreement between the values predicted from equation il6\ (dashed black line) and simulation results for the 
rate (red dots) and barrier height (blue dots) is indeed remarkably good up to < 1-1.1. 

Folding rates obtained from simulations performed with b = Q and variable e^N are also plotted in the figure [TUT c). As 
predicted by the theory, rates accellerate when e^N < (attractive non-native interactions) and decellerate when e^N > 
(repulsive non-native interactions). The theory gives excellent agreement with the simulations in the perturbative limit (dashed 
line in figure [TUf c)). The effect on the rate (at ) of a perturbation with (b, Cnn) = (0, -b^/2T°) is equivalent to the case with 
{b, Enn) = [b, 0). When e^N becomes sufficiently attractive, the prefactor becomes increasingly important in determining the 
folding rate, and rates begin to decrease dramatically. 



IV. CONCLUSIONS 



In this paper we derived a theory for the change in the free energy barrier height to protein folding, as the strength of 
non-native interactions is varied. We find that the barrier height initially decreases as the strength of non-native interactions 
increases. 

This means that if one considers two idealized protein sequences, one completely unfrustrated (a so-called G5-like protein), 
and one with weak non-native interactions that are either attractive or randomly distributed, the mildly frustrated protein will 
tend to fold faster at the same temperature, particularly when the temperature is near the transition temperature of the Go 
protein. This result follows from energy landscape theory |40]. 

The criterion for the rate to increase is related to an increase in packing fraction in the transition state relative to the unfolded 
state (equation ( I18t ). 

The rate increase is supported by the theoretical proposal that proteins exhibit a dynamic glass transition at non-zero temper- 
ature. The consequence of this is that the pre-factor to the rate is initially unaffected as non-native interactions are increased 
in strength from zero. Thus rate-determining effects for nearly unfrustrated proteins arise largely from effects on the folding 
barrier. 

Off-lattice simulations of a coarse-grained Cq model of src-SH3 were used to test the theoretical predictions. Simulation 
results showed even more robust rate-enhancement effects than the theory, due essentially to chain stiffness and contact range 
effects that decrease the number of non-native interactions in the unfolded state. When these corrections are included, the 
theory and simulations are in very good agreement (figure FTOl i. 

The experimental relevance of this effect (reduced number of non-native contacts in the unfolded state) depends on whether 
the fraction of native contacts formed, Q, is a good reaction coordinate for these systems. For unfrustrated or nearly unfrus- 
trated systems, Q has been shown to work well as a reaction coordinate in lattice models |57| (lattice models have limited 
move-sets that may further hinder the use of Q as a reaction coordinate, relative to the off-lattice system we studied here), and 
off-lattice G5-like models of short proteins 1271 12811 . 

Random non-native interactions as well as attractive non-native interactions both speed the folding rate, when they are 
perturbatively small compared to the large native interaction energies that drive folding. The analysis here was done at the 
transition temperture of the Go model. Since the coupling of collapse with folding is fairly generic, it is expected that the 
effect of rate-enhancement would also be seen at different temperatures and stabilities. 

The effect of rate enhancement by non-native stabilization has been seen in several simulation models I4l].l4p.l43ll44ll . as 
well as experiments involving the strengthening of non-specific hydrophobic interactions in a-spectrin SH3 I47ll . 

Some proteins are thought to be sufficiently frustrated that non-native interactions may limit the folding rate. These proteins 
would have non-native energy scales somewhat larger than unity in figure[TUb. at least for some non-native contacts. In some 
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proteins such as Lysozyme, these non-native interactions are thought to stabihze early-formed structures to prevent degra- 
dation or aggregation |58]. All-atom simulations of the 36-residue Villin headpiece segment suggested that the breaking of 
non-native interactions incorrectly packed in the hydrophobic core may form the rate-limiting step on some folding trajecto- 
ries (the authors caution however that this may indicate frustration in Villin, or may indicate an artifact of the force-field 
employed). For proteins that must escape kinetic traps to fold, it is possible that other evolutionary mechanisms in addition 
to funneling may assist folding, such as the selection for amino acids that reduce the escape barrier from the trap i60ll . 

To quantify the rate enhancement it was necessary to treat the entropy of a finite-sized, self-avoiding chain - a problem of 
some interest to polymer physics. The mean-field Flory entropy of a long, self-avoiding chain of packing fraction rj must 
be modified when the chain is sufficiently short that configurations with the characteristic radius of gyration have non-zero 
packing fraction. Then most states have a finite packing fraction dependent on the length of the chain, rather than the bulk 
value of zero. 

From the analysis of simulation data and its comparison with the theory, it emerges that non-native perturbations up to 
values of ^7 ~ e yield values of Tg/Tj- < 0.4 (see figure |9jl, that can still be considered realistic for proteins. All sequences 
characterized by this range of frustration are fast-folders, however the range of ruggedness is sufficiently wide that a variety 
of scenarios are possible a priori for the folding rate. Both rate enhancement and reduction are compatible for realistic levels 
of frustration. This fact may have been exploited by natural evolution to select different effects for different purposes (in the 
same structural family). It is worth noticing that the observed rate enhancement/reduction induced by non-native interactions 
is limited to less than an order of magnitude (at least for the SH3 fold considered here), thus it can not be used to explain the 
much larger variation (spanning more than 6 orders of magnitude) of folding rates experimentally observed for single-domain, 
two-state folding proteins f35lf61l] . 

In this paper we made a very simple generalization of the Go Hamiltonian for a foldable protein, and found this resulted in 
non-trivial and rich behavior of the dynamics of the system. It will be interesting to see what new phenomena emerge from 
further considerations of the Hamiltonian describing biomolecular folding and function. 
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APPENDIX A: ENTROPY OF A PARTIALLY COLLAPSED PROTEIN AS A FUNCTION OF THE NUMBER OF NATIVE 

AND NON-NATIVE CONTACTS 



In terms of the packing fraction the total number of non-native contacts is 

MA=Mt]{1-Q), (A.l) 

where rj is the packing fraction of non-native polymer surrounding the dense (77 = 1) native core. 
The mean-field configurational entropy of a self-avoiding polymer of n links with packing fraction 77 is given by i63l63ll 

-iLAi^=ln-- ^ ln(l-77) (A.2) 

n e \ 77 / 

The conformational entropy of the self-avoiding walk in terms of the fraction of non-native contacts A is given by 

S\\A)=S\\^)\^^^I,,_Q,. (A.3) 



Expressions ( IA.2> and ( IA.3> imply that the polymer chain in question will tend to have ?/ = and A = since this maximizes 
the entropy. However a finite-length chain of n links tends to have a non-zero packing fraction given by 

, ^ na' nc? 
r\{n) K, K, - — - (A.4) 
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where is the volume per monomer and Rg is the radius of gyration of the chain. Up to factors of order unity the RMS size of 
the polymer can be used as well. For chains obeying ideal statistics ?/(«) « n"'/^. For self-avoiding chains in a good solvent, 
accounting for swelling gives rjin) « n~'^l^. However these expressions for the typical packing fraction are inconsistent with 
expression jA.2> . which implicitly assumes an infinite chain limit. For finite-length chains, we seek an entropy function which 
is peaked at non-zero values of 77. 

The assumption of ideal chain statistics for protein segments is not as bad as it may at first seem, because disordered polymer 
segments interact with each other in addition to themselves. Polymers in a melt obey Gaussian statistics |64]. Swelling due 
to excluded volume is counterbalanced by compression due to the surrounding polymer medium if the protein is sufficiently 
large. However, for polymer loops dressing a native core, self-avoidance must be taken into account to fully treat the effects 
of non-native interactions. 

We take the effects of self-avoidance, finite size, and "inter-loop" interactions into account by letting the number of walks 
with density 77 be the number of states at density 77, tx^Sciji) above, times the probability that an ideal walk of I steps has 
density 77: 

f7(7?,^) = e^'*'''^'/?(77|^). (A.5) 

For smaller values of £, larger values of 7/ are more probable. But at higher values of Q, smaller values of I are more 
probable. Hence the non-native packing fraction tends to increase with folding. This is the effect we are quantifying here. 
The number of states of the disordered polymer with packing fraction 77, at degree of nativeness Q, is given by 

f^(^,0 = n ^iTi.^Mm) = n pivi^MiiQ) ■ (A.6) 

e e 

This is the product over all lengths £, of the number of states for a loop of length £ and packing fraction 7/, times the probability 
that the loop of finite length i has packing fraction 77, times the number of disordered loops of length i at nativeness Q. 

We now seek the probability distribution p{rj\N). Consider for the moment one dimensional random walks of steps, which 
we generalize to three dimensions below. The probability p{rj\N) is maximal at the value of 77 corresponding to a Gaussian 
distribution for the chain (i.e. A^"'/^ above). Again however, this alone does not account for self-avoidance, which is why 
Sc(ri,i) must be included later in the analysis. If we let the fraction of walks with variance XNa^ by given by p(X\N), the 
problem of finding p{ri\N) is equivalent to the problem of finding pil\N). This is the probability a walk of steps has an 
anomalous variance of INa^, given that the most-probable distribution of walks fi is given by 

p{x) = (2^A^fl2)-i/2 (-^^ . (A.7) 

The probability p{i,N) can be written as a functional integral over all possible probability distributions, of the probability 
of a given distribution P[p{x)], times a delta function which counts only those walks that have a given variance of iNa^: 

p{l\N) = J Vp{x)P[p(x)] s(^-j^J dxx^p(x)^ . (A.8) 

The calculation is performed in §|B] The result for the probability distribution of anomalous variance 1 is: 

V OTT 

We can see from equation ( IA.9> that the mean value of 1 = 1, meaning that a walk of steps has on average a variance 
Na^. However there is variance 61^ = 6/N in the distribution, so that some walks are either particularly diffuse or condensed 
statistically. The anomalous variance decreases monotonically with increasing A^. 

For a walk in three-dimensions, we define 1 through the variance 

AR2 = Wfl^ (A. 10) 

From the definition of 77 in equation JA.4> . the parameter 1 depends on rj (and A') as 

l(,y) = ,y-2/37V-l/3 (A. 11) 
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dr/ 



(A. 12) 



The probability distribution of walks of density rj is then given by 

p(rj\N) = p(l(rj)\N) 

(the Jacobian is not particularly important here as it enters the entropy only logarithmically). 

With the above definition in equation ( IA.10> for 1 in three-dimensions, /:>(1|A^) remains unchanged from the one-dimensional 
form in equation iA.9i (see AppendixlEt. 

The conformational entropy for a chain of length £ having packing fraction rj is obtained from equations iA.H . jA.5> .( IX!^ . 



Sir,,e) = lnn(rj,l)^SAr],e)-- 

o 



2/3 



H 2 



(A. 13) 



where rj = ^"'/^ gives the most probable value for the packing fraction for an ideal (non-self-avoiding) chain of length i. For 
an interacting chain, enthalpy and entropy must both be considered in finding the most-probable packing fraction, which is 
obtained by minimizing the free energy with respect to rj (see equations ( I12t and ( I13I I. 

We still must find the dependence of loop length £ on the amount of native structure present. We proceed by making several 
approximations for the quantities in equation iAM . The result is not sensitive to the exact values of these quantities. We 
approximate the product over loop lengths in equation ( IA.6t by taking a saddle-point value for £, effectively letting all loops 
have the typical loop length £(Q). Then n(l\Q) = d{£-£(Q))ni^{Q) where rii^iQ) is the total number of loops at Q. The typical 
loop length £{Q) is obtained from the total number of loops and the total number of disordered residues. We estimate the total 
number of disordered residues as a linear function of Q: N(l- Q). This is a mean-field approximation. In capillarity models, 
the deviations from linearity scale as N'^/^, but are of order unity for a typical size protein (see Appendix|CJ. We estimate the 
typical loop length 1.{Q) as the total number of disordered residues divided by the total number of loops: 



N(\-Q) 
ndQ) 



(A. 14) 



Generically for small native cores, the number of loops dressing the native core is proportional to the surface area of the 
core, which goes as the number of native residues NQ to the 2/3 power However for large native cores (a nearly folded 
protein), the unfolding nucleus consists of disordered protein, so that the number of constraints on loops within the core 
(the surface entropy cost) is proportional to the number of non-native residues N{\-Q) to the 2/3 power [4J. We linearly 
interpolate between these two regimes to obtain 



ndQ) 



(1-6) [NQ] + Qmi-Q)f^ + l 

N^'^ [2(1 - Q)f^ [q''^ + (1 - e)'/'} + 1 

N^'^ [2(1-0]'/' + ! 



(A. 15) 



where the expression in curly brackets is approximated as unity since it varies between 1 and about 1 .6 over the range < 2 < 
1. One loop must always be present so that J(Q) remains non-divergent, so we have expHcitly added unity in equation (IA.15> . 
Equations (IA.14> and ( IA.15> together give the typical disordered loop length at Q in the model. Equation (IA.15> is consistent 
with previous statements that the number of loops dressing the folding nucleus scales as A^^/^ |65|, however here the Q- 
dependence is made explicit. When 2 = 0or2=l, «l=1> and by ?(0) = A^, and ^(1) = 0, so the limits behave 

sensibly. 



The entropy of the disordered polymer at Q, Si?],Q), is then given by ni^(Q)S(r],i(Q)), or using equations iA.H . (IA.13> . 



SciQ,v) = A^(l-0<ln- 



rj 



In(l-ry)- 



7] J 



2/3 



= N(l-Q)sn„(Q,v) (A.16) 

where rj(Q) = J(Q)~^^^ = [tii^iQ) /N(l - Q)Y . In equation jA.16> the quantity in curly brackets is the entropy per residue for 
the remaining disordered polymer at Q. Equation ( IA.16t scales extensively with chain length, which is a consequence of the 
mean-field approximation made above. 
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APPENDIX B: CALCULATION OF THE PROBABILITY DISTRIBUTION OF ANOMALOUS VARIANCE 



We again write the probability p{i,N) as a functional integral over all possible probability distributions, of the probability 
of a given distribution P[p(x)], times a delta function which counts only those walks that have a given variance of INa^: 

p{l,N) = J Vp(x)P[p(x)] s(l-j^l dxx^p(x)j . (B.l) 

To obtain P[p{x)] we imagine dividing the x-axis up into bins of width dx, where each bin is labeled by has coordinate 
Xi = idx, and we let 'p(xi)dx = p,-. The probability after trials or events, of a distribution of numbers {«, } across all the bins 
is a multinomial distribution of essentially infinitely many variables 

N\ _ _ 

P{ni}= j — j P'lP?--- (B-2) 

.. .rill 1121 ■■ ■ 

Expanding the log of to second order, subject to the constraint that J2 "/ = ^^d using Stirling's formula, gives 

{Y[2nNni-p^)'j ' exp(-E|g?^) (B.3) 

This is the distribution in the limit of large N. We apply it with the understanding that when is not so large the distribution is 
an approximate solution. The approximation is best where «, is the largest, which is where the distribution is most appreciable. 
In the continuum limit /?{«,} P[p{x)], so that equation ( IB.H can be written as 

pd^N) dke-"'^ J Vp(x) e^^xCip^^k) ^34^ 

where we have Fourier transformed the delta function. The effective Lagrangian here is 

L{p,x,k) = -N^-E^zMl + ik^pix) (B.5) 

where we have used the fact that the probability to be within a given slice of width dx is small. 

The functional integral amounts to finding the extremum of the effective action in the exponent. The extremal probability 
p*ix) = p(x) + ikjfv^p{x) and the extremal action S*{k) = f dxC(p* ,x,k) = -^k^ + ik. The integral over k is then a simple 
Gaussian integral, so the result for the probability of anomalous variance is 

;,(l,A^)=./Ze-^(i-i)V6 (B.6) 

V OTT 

For a walk in three-dimensions, there are three parameters characterizing anomalous variance in x, y, and z- Since e.g. steps 
in y are uncorrected from those in x, the probability of finding parameters l^, },., and h is the product of three terms each of 
the form (IB.6> . but formally with 1/3 the number of steps in each of the three dimensions: 



p(l,,ly,l,,N) = p(\,N/3)p(\y,N/3)p(X„N/3) 

(B.7) 



The variance AR^ is given by 



IStt 



AR2 = Ax2 + A/_^Az- 



^ (A,+A, + A,) 

\Nc^ (B.8) 
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so that we seek the probability distribution p{l,N) of 1 = (A.v + Av + A,)/3. This is given by 
p(l,N) = / dX,,d\ydK 



dXsdXy 3 



18^; V 3 

— ^ ^''^ [(A,-I)^+(A„-lf +(31-A,-A„-l)^] 

IStt 



— e-^<*-i)V6 (B.9) 
67r 



as in the one-dimensional case. 



APPENDIX C: NUMBER OF DISORDERED RESIDUES FOR A GIVEN NUMBER OF NATIVE CONTACTS 

We wish to find the number of disordered residues when a fraction Q of native contacts are present. Equivalently we can 
find the number of ordered (native) residues. In the capillarity model this is the number of residues A'nuc in the nucleus. The 
number of native interactions at Q can be written as the total number of residues times the mean number of interactions 
per residue in the native structure times the fraction of possible native interactions Q. The number of native interactions 
in a capillarity nucleus is the number of interactions in a fully collapsed (Hamiltonian) walk [4J, which has bulk and surface 
contributions, giving the equation 

= Zn (a^nuc - <JNUc) , (C. 1 ) 

where Zb is the number of native interactions per residue in a nucleus of infinite size, and a is the mean fraction of the Zb 
interactions lost at the surface. In the absence of roughening cr is a very weak function of and is of order unity. For walks 
on a 3-D cubic lattice cr = 1 .5. 
In our problem we know the number of native interactions, Nzn- We can find Zb by solving dC.U when N^c = N: 

The number of native residues A^nuc in a capillarity model as a function of Q is then given by the solution of 



A^Nuc-crA'^Nuc 



= (N-aN^^^'jQ. (C.3) 



Equation iC.3i is a cubic equation in A^nuc, with solution of the form 



A^Nuc(e) = 

where 



3 r + Ai/3+^ 



n 3 

(C.4) 



B = 2(T^ + 21NQ-21N^I^Q(T . 

Along with the average loop length, the total number of disordered residues determines the number of loops at Q. A plot of 
the total number of disordered residues for both the capillarity model and the linear approximation is shown in figure^J One 
can see from the figure that a linear approximation for the number of disordered residues is a good one. 



APPENDIX D: SIMULATION MODEL AND METHOD 



We introduce non-native interactions to an otherwise energetically unfrustrated Cq, model of SH3 domain of src tyrosine- 
protein kinease (src-SH3). The energetically unfrustrated model is obtained by applying a G5-like Hamiltonian 1^6^ to an off- 
lattice minimalist representation of the src-SH3 native structure (pdb-code Ifmk, segment 84-140). We have previously shown 
that this topology -based model is able to correctly reproduce the folding mechanism of small, fast-folding proteins 11251 12611 . 
A standard Go-like Hamiltonian takes into account only native interactions, and each of these interactions contributes to the 
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energy with the same weight. Protein residues are represented as single beads centered in their C-a positions. Adjacent beads 
are strung together into a polymer chain by means of bond and angle interactions. The geometry of the native state is encoded 
in the dihedral angle potential and a non-local potential. The G5-like energy of a protein in a configuration F (with native 
state Ffti) is given by the expression: 



bonds 



angles 



+ J2 <'[l+cos(«x(0-0o))] + 



dihedral 




10 / \ 12' 



12^ 



(D.l) 
(D.2) 

(D.3) 



where r and rj^ represent the distances between two subsequent residues in, respectively, the configuration F and the native 
state F^f. Analogously, 9 (Bid, and ((/)o), represent the angles formed by three subsequent residues, and the dihedral angles 
defined by four subsequent residues, in the configuration F (F^;). The dihedral potential consists of a sum of two terms for 
every four adjacent Cq, atoms, one with period « = 1 and one with n = 3. The last term in equation ( ID.3> contains the non-local 
native interactions and a short range repulsive term for non-native pairs (i.e. ei{i,j) = constant < and €2(1, j) = if i-j is 
a native pair, while ei(i,j) = and eiiij) = constant > if i-j is a non-native pair). The parameter dj is taken equal to 
i-j native distance for native interactions, while aij = 4A for non-native pairs. Parameters K^, Kg, K^, e weight the relative 
strength of each kind of interaction entering in the energy and they are taken to be K,- = lOOe, Kg = 20e, A"^'' = e and K'"J^ = 0.5e. 

We introduce a progressively increasing perturbation to the Go-like Hamiltonian by replacing the short range repulsive term 
in equation (ID.3> with attractive or repulsive pairwise interactions Vmiifij) in the form: 



Vnninj) = 



12 
12 



20' 



20 



if r,. y < rN, 
if r, j > rM- 



(D.4) 



Figure[21shows non-native interactions for different values of the interaction strength 77. The strength r/, j for each non-native 
pair (/ , j) is extracted randomly from a Gaussian distribution with mean e^N and variance b^. The parameter a/j in expression 
ID.4l is kept equal to 4A for all non-native interactions, in order to recover the plain Go like Hamiltonian (eauation lD.3> in the 
limit b ^ 0, e^^r 0. The parameter is set to = |cr,j-. The selected values for aij and allow non -native contacts to 
form in the range of r,j ~ 4-5 A . The total energy of a configuration F (with a native state Tn), corresponding to a non-native 
perturbation strength b, is thus: 



£(F,F^)i = £(F,F^)G5 + 



E 

non-native{i.j) 



V„„('-,j,{%}), 



(D.5) 



where {77/,} is a set of quenched variable randomly distributed as described above. The case of = 0, cmn = corresponds to 
the unperturbed G5-like representation of the protein, as it has been studied in refs. ll25ll26i l. and we use it as reference case 
for comparing the folding rates and folding mechanism. Sequences with different amount of non-native energy are defined 
by progressively increasing the parameter b in the interval [0,2]e while keeping ej^j^ = 0, or by varying the parameter in 
the interval [-1, l]e. 

The native contact map of a protein is obtained by using the approach described in ref. ll67ll . Native contacts between 
pairs of residues with j < i+3 are discarded from the native map as any three and four subsequent residues are already 
interacting in the angle and dihedral terms. A contact between two residues (native or non-native) is considered formed 
if the distance between the C^'s is shorter than 7 times their equilibrium distance (Jij (where aij = native distance for a native 
pair, and atj = 4A for a non-native pair). It has been shown |68] that the results are not strongly dependent on the choice 
made for the cut-off distance 7. We have chosen 7 = 1.2 as in refs. [25^.2^. We have used constant temperature Molecular 
Dynamics (MD) for simulating the kinetics and thermodynamics of the protein models. We employed the simulation package 
AMBER (Version 6) f69] and Berendsen algorithm for coupling the system to an external bath |70]. 

For each Hamiltonian (obtained for different values of the parameter b), several constant temperature simulations were 
combined using the WHAM algorithm f7lll72ll to generate a specific heat profile versus temperature and a free energy F(Q) as 
a function of the folding reaction coordinates Q and A. In order to compute folding rates, several (typically 250) simulations 
are performed at the estimated folding temperature for each different sequence. The folding time t is then defined as the 
average time interval between two subsequent unfolding and folding events over this set of simulations. The time length of 
a typical simulation is about 5 x 10*" MD time steps. In this time range 2 to 5 folding events are normally observed for the 
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unperturbed Go-like protein model. 

The errors (reported as error bars in the plots) on the estimates of thermodynamic quantities and folding rates are obtained by 
computing these quantities from several (more than 100) uncorrelated sets of simulations and then considering the dispersion 
of values obtained for the same quantity. 
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TABLES AND TABLE CAPTIONS 



Symbol 


Meaning 


Equation^^ 


Simulation values 




Total number of residues in the protein 


Q 


57 


M 


Total contacts in a fully collapsed globule 


O 


142 


z 


Average number of contacts per residue 


O 


2.49 


e 


Native energy per contact 


O 


-LO 


En 


Energy in the native state 


O 


-142.0 


In J/ 


Maximal entropy per residue 


(Q 


-2.4 




Mean energy of non-native contacts 


©, Gi 


0.0 




Energetic variance of non-native contacts 


fZi 


[0.0 - 4.] 


'TO 


Go folding temperature (in energy units) 


GB 


-L07 



Equation where the symbol is first defined, or representative equation. 



TABLE I: Table of values for parameters in the model 
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FIGURE CAPTIONS 



FIGURE[2 Free energy (first column from the left), energy (second column), and entropy (third column) surfaces as func- 
tions of the fraction Q of native contacts, and the fraction A of non-native contacts, as obtained from theory and simulations. 
Also shown is the fraction of states, n{E), populated as a function of the energy E. The distribution of the energy in the un- 
folded ensemble is shown, along with the distribution in the native state (forth column). The distribution n(E) is normalized, 
i.e. the integral of n(E) over all energies is 1 in all the cases plotted here. All free energy contours are spaced at about le 
(where e is the energy per native contact). Values of the parameters are given in table I. 

Top row: Theoretical free energy, energy, and entropy surface at the folding temperature, obtained from equations ( ITct and 
(IA.16» with the all parameters set equal to the corresponding simulation values (see table I) and biheo = 0.3e where e is the 
energy per native contact (this corresponds to 0.9e < b < 1.3e in the simulations, see text for detail). The transition state has 
more non-native contacts than the unfolded state. The difference in the theoretical model is AA* ~ 0.035. This amounts to 
an increase in the total number of non-native contacts of MA ~ 5. The barrier height is about 3.47e ~ 3.3 k^Tf. 
Bottom 3 rows : corresponding results obtained from simulations, for three different values of the non-native energy pertur- 
bation parameter b: b = 0.5e (second row), b = 0.9e (third row), and b= 1 .3e (bottom row). Barrier heights and values of /\A^ 
obtained in simulations are plotted in figure^Jas a function of the non-native energy perturbation parameter b. 



FIGUREIJI The entropy per residue s„„{Q, rj) = Sc(q, 77) /N( l-Q)in equation (IA.16> . for the disordered part of a protein of 
nativeness Q, as a function of the disordered polymer's packing fraction rj. 



FIGURE 13 The most probable packing fraction 77* is a monotonically increasing function of nativeness Q. The dashed 
curve shows the characteristic packing fraction when the disordered loops are assumed to obey ideal chain statistics. The 
solid curve accounts for the effects of excluded volume, which are included in equation (|8j. Inset: The most probable packing 
fraction is a decreasing function of the mean disordered loop length J in equation (IA.14> . 



FIGURE 0] Fractional change in the number of non-native interactions as a function of nativeness Q, for the theoretical 
model. We can see that the number of non-native interactions initially increases before decreasing. For the model considered 
here, the barrier position is well within this region of values where the number of non-native interactions has increased. 
There are generically more non-native interactions present in the transition state than in the unfolded state, for strongly 
minimally frustrated proteins. The effect is fairly modest- for a hundred residue protein there are about 6 more non-native 
interactions in the transition state. The shape of the curve is obtained from setting dF/dA\„ = in first row of figure^ 



FIGURE|5] Probability of formation of non-native contacts in the native configuration of SH3. Black dots in the contact map 
represent native contacts, non-native contacts formed with probability higher than 0.25 are color-coded according to the gray- 
scale on top. Probability values are computed by averaging the formation of non-native contacts over > 50.000 configurations 
with Q > 0.9 from folding/unfolding simulations. The data shown in this figure are for a non-native perturbation strength 
b/e= 1.3. Similar results are obtained for different values of the parameter b (see figure|6j. 



FIGURE|6l (a) The lower panel shows the maximum number of non-native contacts registered in simulations for different 
values of the perturbation parameter b (in units of e). Black circles dots indicate the maximum in the reaction coordinate A, 
while filled gray dots correspond to the corrected coordinate A' (see text for details). The maximum number of all contacts 
(both native and non-native) is shown in the upper panel, for different values of b. Empty black squares indicate the maximum 
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value obtained when all contacts separated by at least three residues along the sequence are considered (i.e. Q+A), filled gray 
squares correspond to the values obtained when non-native contacts likely to be formed in the native structure are removed 
(i.e. Q+A'). 

(b) Right panel: Average number of non-native contacts (A) formed in simulations as a function the number of native contacts 
formed, for a perturbation parameter b/e = 0.5. Horizontal bars at the maximum of (A) correspond to the standard deviation 
around the average at the peak value. The black curve correspond to the coordinate A, while the gray line to A'. The values of 
Q corresponding to the maximum of (A) and (A') are shown in the left panel, for different values of the perturbation parameter 
b (black circles and filled gray dots, respectively). 



FIGURE0 (a) Upper panel: Continuous curves illustrate the behavior of (A) vs. Q as predicted by the theory (with all 
parameters set to the simulations values, see table I). The different curves correspond to values of b/e = 0.1,0.3,0.5,0.7,0.9 
(increasing values of b lead to higher values of (A)). The thick black line represents the maximum value of A allowed in 
the theory at different values of Q (independent on the value b). Lower panel: (A') vs. Q (continuous curves) as obtained 
from simulations, for values of b/e = [0.2, 1 .6] (increasing values of b lead to higher values of (A')). Dotted curves represent 
the highest values of A' found in simulations at different values of Q, for the same values of b/e. The maximum value of A 
allowed in the theory is also plotted for comparison (thick black line). 

(b) Filled gray dots show the maximum value of A' detected in all equilibrium and quenched simulations for many values of b 
(see text), as a function of the fraction of native contact formed, Q. Black circles correspond to the maximum packing fraction 
of the non-native part of the protein, as obtained by using equation jA.U . i.e. rjmax = A'^^^J(l - Q). Dotted lines show the 
maximum values for A (in black) and rj (in gray) allowed in the theory. Continuous lines in the corresponding colors represents 
the best fit of the data to a phenomenological exponential decay of A',j^j at small values of Q: A^^„ = (1 -0[1 - exp(-Q/2c)]- 
Regression analysis yields Qc = 0. 12. The best fit for A',,^, is shown in gray, in black for 7]max- 



FIGURE|SJ (a) Heat capacity as a function of temperature, as obtained from simulations for different values of the parameter 
b. Temperature is measured in units of native energy per contact, e. (b) Free energy as a function of the fraction Q of native 
contacts, as obtained from simulations for several different values of the non-native energy perturbation parameter b. Free 
energy curves for all values of b shown in (b) are obtained at their corresponding folding temperatures Tf(b) (estimated 
from the heat capacity curves, plotted in (a)), while all curves in (c) are at the folding temperature of the unperturbed case 
= Tf(b = 0). 



FIGURE |9j (a) Folding temperature Tf (black circles) and glass temperature Tg (filled gray dots), from simulation of the 
perturbed Go-model, as a function of the non-native energy perturbation strength b (with e^n = 0). Dotted lines represent 
the theoretical prediction for Tf (black line) and Tg (gray line), when all the parameters of the theory are set equal to the 
simulation parameters (see table I). Dashed lines represent the best fit of the simulation data to the theoretical prediction (see 
(b)). Temperatures and energies are measured in units of e, the native energy per contact. The folding temperature is almost 
constant in the range shown, while the glass temperature raises from zero (plain Go model, with no energetic frustration), to 
values close to 7) for a high level of non-native perturbation. As Tg/Tf approaches 1, several non-native low energy states 
compete with the native state and the folding is dramatically slowed down. Moreover, as Tg/Tf — > 1 the system is no longer 
self-averaging and different realizations of the non-native perturbation can lead to different results. There is a wider range 
where Tf/Tg > 1 in the simulations than in the theory, indicating a larger range of b where rate enhancement effects may be 
seen, (b) The same destabilizing effect on the folding predicted in the theory (in terms of Tf/Tg), for a given value of the 
parameter b, is observed in simulations for a much larger value of b. All values of b used in simulation (bsi,n) are plotted in 
this figure as a function of the values of b yielding the same glass temperature in the theory {b,heor\) (see text for details). The 
dashed line represents the best fit of the data to the expression bsi,„ = abtheon- + P, for bsim > 0.8. The result from this fit is also 
shown in (a). 



FIGURE^! (a) Average difference in the number of non-native contacts in the transition state and unfolded state ensem- 
ble, as detected from simulation, as a function of the perturbation parameter b/e. The values obtained by considering all 
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non-native contacts are shown as dark circles, while filled gray dots correspond to the corrected values (i.e. considering only 
non-native contacts that are not formed in the folded state, see text for details). The average of these quantities over all the 
considered values of b/e are plotted as continuous straight lines of the same color These numbers are comparable to the 
theoretical estimates of '-^ 6 more non-native contacts in the transition state for a 100-residue protein (see figure|3. 
(b) Barrier height AF^ (black circles) and log folding rate k (filled gray dots) as a function of the non-native energy per- 
turbation strength b (for e^N = 0), and (c) log folding rate ^ as a function of the average non-native energy e^N (for b = 0). 
The parameters controlling the strength of the non-native energy, b^/2T and e^N, enter the free energy at the same footing in 
the theoretical model (see equation (ITct). Results from simulations are in very good agreement with the theoretical predic- 
tion (dashed black curve, in figures (b) and (c)) obtained when the value of AA'^ -shown in (a)- is used as an estimate of 
entering equation il6\ predicted by the theory. Values of Ink and AF^ are normalized to the correspond- 
ing values for the unperturbed case (In^o and AF^^). For large non-native energy perturbations (b > 1, or e^n > 0.5) both 
AFt(7y') and In^ rapidly decrease (see also figure|8jb) and (c)). The energy parameters b and e^N are measured in units of 
native energy per contact, e. Barrier heights are measured in units of the folding temperature for the unperturbed case (A;b7{)). 



FIGURE^2 Plot of the amount of disordered polymer in the protein as a function of Q, for a mean-field model which has 
the form A^(l - Q), and for a capillarity model which has the form N-Nkuc, where A^^uc is given in equation ( IC.4l i. The inset 
shows the difference on a magnified scale. Here the chain length = 100, and the mean fraction a of interactions lost on the 
surface of the capillarity nucleus (see equation dC.H ) is taken numerically to be 1.0. For systems of size 100 the deviation is 
only a few percent, the relative deviation goes as A^"'/^. 



FIGUREEI Non-native interactions for increasing interaction strength (regulated by the parameter rj, see equation lD.4t . 
from highly repulsive to highly attractive (thin curves). Curves for each value of rj indicate a Icr width in the non-native 
potentials. The unperturbed potential (short-range repulsive term in equation lD.3> is plotted as a reference (thick curve). 
Energies are measured in units of native energy per contact, e. 
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